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Abstract 



A specific implementation of Bayesian model averaging has recently been suggested as a method 
for the calibration of ensemble temperature forecasts. We point out the similarities between this new 
approach and an earlier method known as kernel regression. We also argue that the Bayesian model 
averaging method (as applied) has a number of flaws that would result in forecasts with suboptimally 
calibrated mean and uncertainty. 

1 Introduction 

There is significant demand within industry for adequate probabilistic forecasts of temperature. However, 
this demand has not been met by the meteorological community and such forecasts are not commercially 
available. A small number of forecast vendors do produce probabilistic forecasts but the calibration meth- 
ods they use are flawed. A number of academic papers have suggested methods by which such forecasts 
could be improved but again the methods described are flawed. To attempt to remedy this situation we 
run a program of research aimed at clarifying the issues involved in the creation of probabilistic tem- 
perature forecasts and at developing methods that can be used to produce such forecasts. We are not 
forecasters ourselves: our hope is that the forecasting community will use the methods we describe to 
produce forecasts that we can then use in our industrial applications. 

This article discusses a new method with the name Bayesian model averag ing (BMA) that has recently 
been proposed for the calibration of temperature forecast ensembles (see iRafterv et al.] ()2003|) 'l. Our 
purpose is twofold: 

1. To point out the close connections between BMA and earlier methods known as kernel regression 
(KR) and kernel spread regression (KSR) 

2. To describe a number of flaws that we believe that the BMA approach suffers from that render it 
inappropriate as a method to be used for the calibration of real forecast data 

We start by describing the KR and BMA approaches. We then compare the two and point out the 
problems we see in BMA. Finally we suggest some further methods that take f eatures from both BM A 
and KR that could be used to solve the calibration problem that is discussed in iRafterv et all ll20n.'^ . 

2 Kernel Regression 

Kernel regression (KR) was described by us in I Jewsonl l)2003(l . It is a method that takes an ensemble 
forecast and turns it into a probabilistic forecast. The simplest reasonable way to do this is to use 
linear regression on the ensemble mean. KR is a simple extension of linear regression that allows for the 
representation of non-normality in the temperature distribution by putting a small kernel of optimum 
width around each ensemble member. The probability density forecast from KR can be written as: 



M 




(1) 



where the pi are the individual kernels given by 



(2) 
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w here Xj is the i'th ensemble member and A is the bandwidth (these equations come from equation 1 
in lJewsonI l)2003(l ). 

In addition to applying kernels in this way the mean and the variance of the ensemble members are 
calibrated using linear regression. We write the complete model as: 

T,^ Kia + (3m,,j,\) (3) 

KR calibrates the ensemble mean using linear regression (which gives an optimal combination between 
the ensemble mean and climatology) and fixes the spread and the non-normality using the parameters 7 
and A. The parameter A is the bandwidth of the kernels used and controls the smoothness of the final 
predicted distribution. Small values of A lead to a multimodal distribution while large values of A lead 
to a unimodal smooth distribution. 
The mean of the prediction from KR is given by: 

E{x) = a + f3mi (4) 

while the variance of the prediction, which is constant in time for the anomalies, is given by: 



var{x) = A2 + — ^(x, -/x)2 (5) 

i=l 

or 

variance of modelled temperatures = A^ + sample variance of calibrated ensemble members (6) 
(this equation is equation 9 in I.Tewsonl lj20fl3l) ). 

An extension of KR that allows for the uncertainty to vary in time according to variations in the ensemble 
spread is also described .Jewson (2003,) , and can be written as 

Ti'^ K(a + Pmi,-j + 5si,X) (7) 

This model, known as kernel spread regression (KSR), calibrates the ensemble spread by h aving separate 
parameters for the mean and the variance of the spread. This was shown to be necessary in l.Tewson et al.l 

The predicted variance from KSR is: 

1 ''^ 

var(x) = + — ^(x,-m)' (8) 
1=1 

= A2 + (7 + (5s0' 



3 Bayesian model averaging 

BMA is a general approach for combining the results from several statistical models using weights (jHoeting et all 

There are a number of ways that BMA could be used in the creation of probabilistic forecasts. 
We will discuss the particular application of BMA given in lRafterv et alJ l|200 t^. The conclusions we will 
draw do not apply to BMA in general, but only to this particular way of using BMA. 
The suggestion in iRafterv et al.l l|2003l) is that the probability density of future temperatures can be 
modelled as a weighted sum of a number of probability densities from different sources: 

i=M 

Pi^) ^ X! "^^diix) (9) 

where 

g,ix)r^Nix,,a^) (10) 

where Xi are the ensemble members (these equations are equations 2 and 3 from IRafterv et all ll2003^ . 
written in our notation). 

The variance of the probabilistic forecast is then given by 

M M 

var{x) Wi{xi ~ ^i)"^ Wicr^ (11) 



= 1 i=l 



(this is equation 7 from IRafterv et all l|2003|) '). 



4 The connection between BMA and KR 



We now consider how BMA and KR are related. To see the connection we consider a case where the 
individual forecasts are statistically identical. BMA also considers the more general case where the 
forecasts are statistically different although we will argue that since it doesn't work in the simplest case 
of identical members it certainly can't be expected to work in the more complex cases. 
If the forecasts are statistically identical then we can assume that the BMA weights and di's are equal: 



w, ^ — (12) 

(Ji = a (13) 

Equation]^ now gives: 

P(^) = E M9^{x) (14) 

i=l 

and we can see that this agrees with equation^if we define gi{x) = Mpi{x) i.e. if we normalise the kernels 
differently. So this part of the two models is the same up to a simple definition of the normalisation. 
The BMA predicted mean is just 

M 

4=1 

i.e. the ensemble mean, and the BMA predicted variance is 

M ^ 

var{x) =^—{x,- ^if (16) 

4=1 

We can now see the similarities and differences between BMA and kernel regression very clearly. 

1. By comparing cquation^with cauation llSl we see that BMA predicts the expected temperature us- 
ing the ensemble mean while KR predicts the expected temperature using an optimum combination 
of the ensemble mean with climatology 

2. By comparing equations [S] and |21 with equation 1161 we see that BMA calibrates the mean level of 
uncertainty, the variability of the uncertainty and the smoothness of the distribution using a single 
parameter a. KR uses two parameters to calibrate the mean level of uncertainty and the smoothness 
while KSR uses three parameters to calibrate the mean level of uncertainty, the variability of the 
uncertainty and the smoothness. 

BMA (when applied to the identical members case) is a special case of KSR in which (3 — 1,7 = and 
(5 = 1. 



5 The problems with BMA 

Unfortunately Bayesian model averaging seems to suffer from a number of flaws as a method for calibrating 
temperature ense mbles. These i ssues discussed below: the research on which these conclusions are based 
is summarised in Ijewsoni l|2004l) . 

The first problem concerns the calibration of the ensemble mean. In the special case that we are consid- 
ering BMA p r edicts the expected temperatur e using th e ensemble mean. However it is well documented 
ijLeithI l)l974) , Ivon Storch and Zwiers (lOOgf) , Ijewson and Ziehmann 1,2003) ) that the ensemble mean is 
not the optimal forecast for the expected temperature: a 'damped' version of the ensemble mean calcu- 
lated using linear regression is better. This damping performs an optimal calibration of the ensemble 
mean with climatology. An undamped ensemble mean such as that produced by BMA does not have the 
correct variance and will not minimise RMSE. 

The second problem concerns the calibration of the uncertainty. To correctly calibrate the uncertainty of 
a probabilistic forecast one needs to consider (at least) two operations. First, the temporal mean of the 
uncertainty must be fixed at an appropriate level. There is no information about the temporal mean of 
the uncertainty in the ensemble itself: this information can only come from past forecast error statistics. 
Secondly, the amplitude of the variability of the uncertainty must be fixed at an appropriate level. Again, 



there is no information about the amphtude of the variabihty of the uncertainty in the ensemble itself: 
this must be fitted from past forecast error statistics too. What the ensemble provides is then the relative 
amplitude and phase of the fluctuations of the uncertainty. 

The important point is that these two calibration steps (calibrating the mean and the amplitude of the 
variability of the spread) are independent. To set the mean level of the uncertainty correctly one typically 
needs to inflate the ensemble spread. However, to set the amplitude of the variations in the uncertainty 
correctly one may need to reduce the amplitude of the variations in the ensemble spread. A statistical 
model thus needs at least two parameters in order to calibrate spread correctly. If only one parameter is 
available, the calibration of the mean and the variability of the uncertainty will be mixed together, and 
the results will be somewhat arbitrary and very possibly less good than a calibration method that ignores 
the variability in the ensemble spread altogether. This mixing of different aspects of the calibration is 
what happens in BMA^. 

KR, KSR and BMA add another operation in the calibration of the ensemble, which is the smoothing of 
the ensemble towards or away from a normal distribution. If the bandwidth of the kernel (A in kernel 
regression and a in BMA) is large then the ensemble is smoothed towards a normal while if the bandwidth 
is small the probability forecast will likely be rather multimodal and will have a shape that depends more 
strongly on the distribution of the individual ensemble members. This smoothing operation needs a 
separate parameter to be performed correctly as it is an independent issue from the calibration of the 
uncertainty. KR and KSR use a separate parameter for this step while BMA uses the same parameter 
as is used to calibrate the uncertainty. 

In summary BMA only has a single free parameter (cr) rather than the three that are required to perform 
the calibration that is being attempted. Thus the three operations that are being performed (calibration 
of the mean level of the uncertainty, calibration of the variability of the uncertainty and calibration of the 
smoothness of the forecast distribution) are mixed together. It is easy to imagine situations in which this 
would cause problems. For instance it would not be possible for BMA to correctly calibrate an ensemble 
for which the variability in the ensemble spread contains very little information (requiring a large value 
of cr) but in which the temporal mean of the ensemble spread is close to the correct level (requiring a 
small value of cr). Nor would it be possible for BMA to correctly calibrate an ensemble for which the 
ensemble spread was larger than the actual uncertainty. 

6 The solution 

The solution to this problem is to use the correct number of free parameters for the calibration that is 
being attempted. Given only a single parameter the most sensible course of action seems to be to assume 
a normal distribution, ignore the variations in spread and use the parameter to represent the mean level 
of uncertainty. Given two parameters one should calibrate the mean and variability of the uncertainty, 
while still assuming a normal distribution. Finally given three parameters one can calibrate all three of 
the mean level of uncertainty, the variability of the uncertainty and the smoothness. 

7 Weighted kernel regression 

In iRafterv et all ll2003l) BMA was used to combine a number of forecasts that were not statistically 
identical. We have argued that BMA does not calibrate correctly in the statistically identical case, and 
so cannot be expecte d to work in more ge neral cases either. How, then, should the original calibration 
problem described in lR.afterv et alJ (|20n,'^ be solved? The kernel regression models should not be used 
as is since they assume that the forecasts are statistically identical. 

One can imagine methods that take the best of the KSR and BMA approaches that might include one 
or more of the following features: 

• the mean is predicted using multiple linear regression on the anomalies 

• kernels with different widths are used on each ensemble member 

• the kernels could be combined with different weights 

• the uncertainty is predicted using some linear function on the weight ensemble spread 

^ and to be fair we should note that this problem also arises in other forecast calib ration methods that have been suggested 
in the academic literature such as the methods of iRoulston and SmithI 120031) and lMvlne et alj J2002D 



However, our previous experience of calibration suggests to us that much simpler models might perform 
just as well since the effects of non-normality and the benefit of using the spread may well both be small. 
In that case multiple linear regression on the anomalies is probably ideal, and whatever method is being 
used it should be compared with linear regression on the anomalies as an appropriate minimal model. 



8 Summary 

We have discussed the question of how to produce probabilistic forecasts of temperature. In particular 
we have dissected the Bayesian model averaging approach of Rafterv et al. ( 2003). This approach is very 
similar to an earlier approach known as kernel regression ijjewsonl 120031) . We have argued that BMA does 
not calibrate temperatures in an appropriate way. Neither the predicted mean nor the predicted variance 
are constructed accurately. With respect to the predicted mean, the issue of 'damping' towards climatol- 
ogy has been omitted. With respect to the variance, BMA mixes the separate functions of calibrating the 
mean level of uncertainty, the amplitude of the variability of the uncertainty a nd the smoothness of the 
forecast distribution into a single factor. We conclude that BMA (as applied in lR.afterv et al.l (2QQ^) is 
not a calibration method at all, but simply a method to fit a distribution to a set of ensemble members. 
As such it is more or less the same as the well known kernel density of classical statistics. 
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